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Abstract 

We study numerically the effect of sequence heterogeneity on the thermodynamic prop- 
erties of a Poland-Scheraga model for DNA denaturation taking into account self- 
avoidance, i.e. with exponent Cp = 2.15 for the loop length probability distribution. 
In complement to previous on-lattice Monte Carlo like studies, we consider here off- 
lattice numerical calculations for large sequence lengths, relying on efficient algorithmic 
methods. We investigate finite size effects with the definition of an appropriate intrinsic 
length scale x, depending on the parameters of the model. Based on the occurrence of 
large enough rare regions, for a given sequence length N, this study provides a qual- 
itative picture for the finite size behavior, suggesting that the effect of disorder could 
be sensed only with sequence lengths diverging exponentially with x. We further look 
in detail at average quantities for the particular case x = 1.3, ensuring through this 
parameter choice the correspondence between the off-lattice and the on-lattice studies. 
Taken together, the various results can be cast in a coherent picture with a crossover 
between a nearly pure system like behavior for small sizes N < 1000, as observed in 
the on-lattice simulations, and the apparent asymptotic behavior indicative of disorder 
relevance, with an (average) correlation length exponent Ur > 2/d (= 2). 
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1 Introduction 



The discovery of the DNA double-hehcal structure, some 50 years ago, motivated the elab- 
oration of the hehx-coil model to account for the separation of the two strands, on physical 
bases [H Ej [3]. The importance of this model from the biological point of view is obvious, 
since processing of the genetic information involves precisely the separation of the strands. 
Of course, under physiological conditions, the opening of the double-helix is not under the 
effect of temperature, but the differential stabilities in DNA sequences, as revealed by helix- 
coil analysis, could be sensed by biological effectors, such as proteins, under various types 
of constraints. The successful development of the helix-coil denaturation model required ap- 
propriate elaborations for the physics and the algorithmics, allowing accurate tests through 
comparisons with experimental data (melting curves). This field, very active in the sixties 
and seventies, has benefited recently from a renewed interest both from the biological side, 
for example in the context of genomic analysis, and from the physics side, notably in relation 
with questions relevant to the order of the transition in the homogeneous case and the effect 
of sequence heterogeneity. In the light of these still debated issues, both from the theoret- 
ical and the numerical points of view, the main focus of the present work is the numerical 
investigation of the relevance of disorder in a realistic DNA denaturation model d la Poland- 
Scheraga, in which self-avoidance between loops and the rest of the chain is also taken into 
account. In what follows, before further detailing the particular system considered and the 
open questions, we first recall briefly the general background in terms of biological models, 
numerical methods and previous results. 

Basics for DNA denaturation: DNA denaturation is an entropy driven transition, in which at 
some critical temperature Tc the energy loss AE with the opening of base pairs is compensated 
by the entropic gain TAS associated with the increased number of configurations accessible 
to the separated single strands. Experimentally, it is found that depends on different 
factors, in particular the pH of the solution and the GC composition of the sequence, related 
to the ratio of the Guanine-Cytosine, GC, pairs to the Adenine-Thymine, AT, pairs. For 
homogeneous sequences, for pH ~ 7, typical values for Tc are Tc^gc ~ 110 °C and Tc^at ~ 
70 ^C, respectively for GC and AT cases. Such differences reflect of course the fact that the 
pairing of Guanine to Cytosine involves three hydrogen bonds whereas that of Adenine to 
Thymine involves only two. 

For a given biological sequence of length A^, here identified, following AT and GC pairs, 
by the coupling energies {e^, z = 1, . . . , A^}, the denaturation transition can be followed with 
UV absorption. Correspondingly, the fraction 6^{T,N) of closed base pairs, which is the 
order parameter of the transition in the thermodynamic limit N ^ oo, can be measured 
in such experiments based on differential absorptions for closed and open base pairs. The 
resulting curves display usually multi-stepped structures, with abrupt variations on small 
(sequence-depending) temperature ranges around Tc. Therefore, for a biological sequence of 
fixed length, the finite size order parameter 6e{T, N) varies from zero to one (associated with 
complete denaturation), with a sequence-dependent behavior. Accordingly, the derivative 
with respect to temperature, —d9^(T,N)/dT, displays typically a series of sharp peaks. 



2 



From the theoretical point of view, modehng DNA denaturation was essentially follow- 
ing two main directions: 1) for biological applications, in relation with melting experiments 
(sixties, seventies), sequence- dependent algorithmic elaborations for the handling of realistic 
physical models [2], HI [5], concerning notably the representation of denaturation loops, and, 2) 
for the study of the underlying physics, detailed characterizations of the properties for pure 
systems, neglecting sequence-specificity [SlElElElIinillllinillSlIIllIISlE]. 

Physics of DNA denaturation for homogeneous sequences: DNA denaturation is understand- 
able in the framework of almost unidimensional systems [18], and it is therefore associated 
with a peculiar kind of transition. In fact, the first models displayed no thermodynamic sin- 
gularity [2], as they corresponded to Id Ising models with only short-range (nearest-neighbor) 
interactions, with open and closed base pair states represented by an Ising spin. It was subse- 
quently shown, notably by Poland and Scheraga [S| (PS, in what follows), that the observed 
denaturation behavior can indeed be described in terms of a simple Id model, the helix-coil 
model, that consists of alternating regions of contiguous open base pairs (coiled regions or 
loops) and double-stranded ones (helical segments). In this model the transition in the ther- 
modynamic limit is made possible through the adoption of appropriate long-range entropic 
weights for the single-stranded loops. 

More recently, several other models have been considered and studied, using in particular 
more realistic potential forms between base pairs [H [9] . Since sharp transitions are observed 
experimentally, with abrupt changes in 6^{T,N) on small temperature ranges, it is expected 
that a model, accounting correctly for such results, should undergo a first order transition 
in the pure case. Indeed, this point has been studied rather extensively recently [31 El El [101 
[TTl \T2[ [131 [m [IS [m [IZj- In particular, it was demonstrated [TU] that the transition is of 
first order in pure PS models in which excluded volume effects for loops are not only with 
themselves, but also with the rest of the chain. Notably, with the probability distributions for 
loops with lengths / at the critical point following a power law, P(/, Tc) oc l/l'^'^, the transition 
is of first order for Cp exponents larger than 2 [HI [3 [TS] (see also [3]). It was shown that in 
three dimensions, with the two strands described as self-avoiding walks (SAWs), the value for 
the exponent is Cp ~ 2.15 [TOl |12l [TU [T5|. In comparison, Cp = 3/2 for random walk (RW) 
loops [6] and Cp = 1.76276(6) for SAW loops, with excluded volume interactions with the rest 
of the chain neglected [3 [T9] . 

Biological and algorithmic backgrounds for sequence- specific DNA denaturations: The algo- 
rithmic problem was initially encountered for the implementation of sequence-specific calcu- 
lations allowing notably experimental/theoretical comparisons in the study of melting curves. 
It seemed natural, in the beginning, to resort to transfer matrix formalisms as developed in 
physics because of the Ising-type formulation of the problem [2]. Indeed, neglecting loop- 
entropy long-range effects, the calculation of the partition function for a sequence of size 
can be expressed simply as the product of N 2x2 matrices. The extension to realistic models 
was at first handled through extended transfer matrices, of sizes growing up to NxN, for the 
proper description of interactions throughout the lengths of the sequences [2]. Because of 
calculation burdens associated with such matrix sizes, alternative formulations were sought 
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for the implementation of realistic models with affordable computation times. Representing 
the culmination of a series of developments, through some twenty years, an appropriate al- 
gorithmic solution was proposed in 1977 by Fixman and Freire [1] (FF method), in which 
calculation efficiency was not at the price of oversimplifications in the physics, but relied in- 
stead on the numerical representation of the long-range effect as a multiexponential function. 
In this formulation, the time complexity for the evaluation of a complete denaturation map 
for a sequence of length is essentially proportional to N. This reduced complexity is to 
be compared with the intrinsic complexity of the model scaling as A^^, if we were to consider 
exact one-way calculations along the sequence. 

In this background, no generalizations were proposed for the ideas in the FF for a long 
period of time, possibly because of the formulation of this method in the prolongation of an 
algorithm by Poland [20], expressed in the rather specialized context of conditional proba- 
bilities recursions specific to the linear DNA helix-coil model. As a matter of fact, the only 
applications of the FF concerned the implementation of the algorithms in computer programs, 
for DNA melting calculations (such as in the POLAND [21] or in the MELTSIM programs 
[22]). However, upon revisiting the original derivations, it appears that the idea associated 
with the multiexponential representation, relying on the fundamental property of exponential 
function, corresponds to a powerful concept amenable to many generalizations for realistic 
models with long-range effects. Accordingly, based on explicit partition function calculations, 
the SIMEX (SIMulations with EXponentials) method was first derived as a reformulation of 
the FF for the linear helix-coil model [5], with further generalizations to higher-order models 
involving several, mutually coupled, long-range effects [25112^ . For such systems, with two or 
more long-ranges, the reductions of complexities by several orders of magnitudes can be as- 
sociated with calculation times reduced by million folds. The basic concepts for higher-order 
models were originally illustrated with a circular DNA helix-coil model-problem involving 
two long-range contributions [23] . The corresponding principles were further transposed to a 
linear helix-coil model with non-symmetrical loops |24j . 

On the experimental side, linear helix-coil models were successfully compared with exper- 
imental denaturation curves [22] . In the beginning, one of the motivations in the elaboration 
of DNA denaturation models was to ask the question of possible relations between genetics 
(coding/non-coding) and physical (helix/coil) segmentations, because of the importance of 
the separation of the two strands in the processing of the genetic information. With little 
genomic sequences available at the end of the seventies and beginning of eighties, no clearcut 
conclusions were reached for such relations. More recently, with the availability of complete 
genomes, it was possible to resume such investigations on much larger bases, demonstrating 
variable correspondences between the two types of segmentations, depending on the genomes 
[25| [26| [271 [28] • For genomes with very sharp correlations it was even possible to propose ab 
initio gene identification methods purely on physical bases [2B] . 

Physics of DNA denaturation with disorder: On the physical side, DNA denaturation models 
are associated with important open questions such as, notably, the relevance of sequence- 
heterogeneity for the thermodynamic limit behavior. At the beginning of the seventies, it was 
noted by Poland and Scheraga [2] that "sequence heterogeneity dramatically broadens the 
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transition", but it is only recently that such problem has been addressed on more rigorous 
bases and the transition in disordered PS models with Cp > 2 was also investigated [221 EDI 
[311 [32| [33| Indeed, in the homogeneous case, such systems exhibit peculiar first order 
transitions characterized by a diverging correlation length, and it is therefore not clear to 
which extent general theoretical results on the effect of disorder [351 ESI [37] can be applied. 
The question has been addressed both with analytical [521 [55| [M] and numerical approaches, 
with either off-lattice [221 [30] or on-lattice [21] implementations in this latter case. It appears 
however hard to reconcile these various results. The off-lattice studies of [29l [30] , involving 
very large chain lengths, suggest a peculiar transition, of first order as in the pure case but 
not obeying usual finite size scaling and exhibiting two different correlation length exponents, 
associated respectively with typical and average quantities. On the other hand, the Monte 
Carlo like numerical simulations in [HT], limited to small chain lengths, agree with a second 
order transition in the presence of disorder, though it was not possible to rule out completely 
a transition still of first order. Finally, from the analytical standpoint, under quite general 
hypotheses encompassing PS models with Cp > 2, it was shown that the transition is expected 
to be at least of second order and possibly smoother [32l [331 Ell- 
in the background above, in addition to their relevance to experimental DNA denaturation, 
PS models with sequence heterogeneity represent interesting toy-models for addressing general 
open questions relative to the proprieties of random fixed points. The detailed study of such 
systems could also help elaborating the correct approaches to be used in the interpretation 
of data on disordered models. In this direction, we perform here a numerical analysis of 
a disordered PS model with Cp = 2.15. Relying on appropriate algorithmic formulations 
(SIMEX) we consider long sequences. With the definition adopted for the model, and the 
choices for the parameter values, the calculations are made directly comparable with previous 
on-lattice results [31]. In addition, the observed behavior should be also related to that found 
in the previous off-lattice studies of a different disordered PS model with Cp = 2.15 [291 [30] . 
in which the same multiexponential representation for the long-range loop entropy law was 
adopted. 

Our findings show the existence of very strong corrections to scaling. Moreover it appears 
that, for a given size, the effect of disorder is qualitatively described by an appropriately 
defined intrinsic length scale x depending on model parameters. These observations provide 
a possible explanation for the discrepancies between previous results [2^ [501 [5T] . as well as 
for an apparent dependence of the evaluated critical exponents on model parameters noted 
in [31]. In fact, in the frame of the picture proposed here, the size at which the effect of 
disorder becomes evident could diverge exponentially with x. More precisely, for the value 
X = 1.3 chosen for the present detailed study, it is possible to observe a crossover between a 
nearly pure system like behavior, consistent with the one observed in simulations [HT], and the 
apparent asymptotic one. With corrections to scaling taken into account, the model clearly 
displays a smooth transition, corresponding to a value for the correlation length exponent 
Ur > 2/d {= 2). Nevertheless, since our results refer to average quantities, they do not rule 
out the possibility suggested in [291 EOl [Ml [39] of a transition governed by two different 
correlation lengths. The analysis for the clarification of this point is left for a forthcoming 
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work [10]. 



2 Models a la Poland- Scheraga 

2.1 The pure case and the role of the exponent Cp 

Pure PS models for DNA denaturation are described in a rather extensive literature, and in 
particular the ingredients which make the transition of first order are discussed in several 
recent works [5], [101 [HI [121 [131 [IH [13 [IS]- Here we will only recall results for linear PS 
models with symmetric loops, in which one fully takes into account self-avoidance through an 
appropriate choice of the loop length distribution probability exponent Cp. 

The position of a base pair along the sequence is labeled hy i {i = 1,...,N) and its 
configurational state is represented by Sj, with Sj = 1 for a closed pair and Sj = for an 
open pair. In the corresponding on-lattice representation, the two strands in the model can 
be visualized as two interacting RWs with the same origin in d dimensions. A pair is in the 
closed state if and only if the two bases are in the same position i along the two strands and 
occupy the same lattice point [13] (RW-DNA). One can write the canonical partition function 
for the system in the form: 



where M{E) is the number of configurations with the same total energy E and (3 = 1/T is 
the inverse temperature, taking Boltzmann constant /cg = 1 for simplicity. 

The contribution of a single base pair in the closed state to the total energy E is AE = 
— e, independent from the position i in the pure case. The number of configurations of a 
closed segment of length n increases as /i" and therefore its contribution to the total entropy 
S{E) = log J\f{E) is AS = nlog/i. Here /i is a parameter of the model, interpretable as 
the on-lattice connectivity constant (/i = 2d for rf-dimensional RWs on a cubic lattice). A 
denatured region of length / is associated with a single-stranded loop of length 21, and the 
corresponding number of configurations is given by n'^^/{2lY^. The factor l/(2/)^f takes into 
account the fact that the two separated chains have to meet again at some distant point, and 
the relation Cp = d/2 holds for (i-dimensional RWs. 

Assuming that the first and the last base pairs are always coupled, a given configura- 
tion of the system is described by the lengths of the closed segments {rij} and by those 
of the denatured loops {h}, with + J2ki^k — 1) = ^tot + hot = N. Its energy is 

E = = —^^tot, only depending on ritot- Therefore, in the partition function, the 

degeneracy factor N'{E) includes both the entropic contribution of segments /i"'"' and that 
of loops l/{2lkyp, the latter being to be summed over all the possible sets {Ik > 2} 

associated with a given total length hot = "^ki^k — 1). Correspondingly, the partition function 
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can be also written as: 



ntot {lk>2} k 



n ro/.^ 



It can be noted that the factor e^^'"^'^ contributes an additive constant to the entropy and 
it is therefore not relevant for the description of the thermodynamics of the system. In what 
follows, we study properties of Z*^ = Z^/e^^^"^^. Accordingly, the number of loops of length 
/fc is taken equal to 1/(2/^)'^^ and a negative contribution to the entropy AS = — log/i is 
associated to each base pair in the closed state. It is seen qualitatively that the possible 
change in the thermodynamic limit behavior occurs at the temperature for which jSce ~ log fi. 
From the last expression in ([2]) it is moreover clear that, in the computation of the grand 
canonical partition function Z, the contributions of helical segments and loops are decoupled 
and one obtains a geometric series [21 El El [121 [IB] : 



Z 
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where we introduced the fugacity z and Zs and Zl refer to the segment (helical) and loop 
(coil) grand partition functions respectively: 
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(4) 
(5) 



Since the behavior for ^ cxd is dictated by the fugacity value z* corresponding to the 
pole nearest to the origin, the system undergoes a phase transition when a critical temperature 
Tc is found below which the zero of the denominator becomes smaller than the smallest pole 
of the numerator. The possibility of the transition and its order both depend on the value of 
the exponent Cp [21 [6l [71 [T21 [H]. In detail, for Cp < 1 there is no thermodynamic singularity, 
whereas for Cp > 1 the following situations must be distinguished: a smooth transition for 
1 < Cp < 3/2 with a specific heat exponent < 0, a second order transition for 3/2 < Cp < 2 
and finally a first order transition for Cp > 2. In fact, these distinctions can be understood 
considering the properties of Zl, i.e. those of the distribution probabihty of the loop length 
at the (possible) critical point P{l,T^ = l/{2iyp: 



5^/p(/,r,: 



oo for Cp < 1 



oo 



for Cp < 2. 



const for Cp > 2 



(6) 
(7) 
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In the case Cp < 2, the mean length (/) for loops at the critical point diverges and the system 
exhibits large coiled regions, in which most of the bases are involved. On the contrary, for 
Cp > 2, the mean loop length at Tc is finite and correspondingly it is possible to show that 
the density of closed base pairs 6{T,N) = {n)/N, and therefore the energy density, varies 
abruptly in the thermodynamic limit, from the value zero at high temperatures to a finite 
value at T^. In detail, when approaching the critical point from the low temperature (helical) 



For example, the RW-DNA model in three dimensions exhibits a second order transition with 
ap = 0, since Cp = 3/2, whereas in five dimensions it undergoes a first order transition, since 
Cp = 5/2, as confirmed both by exact computations of thermodynamic quantities and by 
on-lattice numerical simulations [T3] . 

2.2 Exponent value Cp = 2.15 

All interactions between different loops and helical segments are neglected in classical calcula- 
tions of the grand canonical partition functions for helix-coil models. It is possible to account 
for self- avoidance of each loop with itself through the appropriate choice of the exponent 
Cp = 1.76276(6) [Zl[l9], corresponding roughly to the value adopted usually for comparisons 
with experimental data [22]. More recently, it was demonstrated that self- avoidance of the 
loops with the rest of the chain can be also taken into account, and that intriguingly the 
pure PS models exhibit first order transitions in this case [IHl [HI [12] • The exponent Cp, cor- 
responding to a self-avoiding loop embedded in a self-avoiding chain, can be predicted from 
conformal theory results [H], and in particular it was found that Cp ~ 2.15 in three dimen- 
sions, the transition being of first order also in d = 2. It is notable that such determination 
provides the appropriate value of Cp to be used as an input in off-lattice calculations. 

In the Monte Carlo like simulations, one studies an on-lattice model (SAW-DNA) in which 
self-avoidance is completely taken into account, by considering two interacting SAWs, with 
two monomers allowed to occupy the same lattice point if and only if their positions along 
the two chains are identical, thus representing complementary base pairs [13]. In the pure 3d 
case, it was found that this system exhibits a first order transition, with the maximum of the 
specific heat diverging linearly with the chain length. It was subsequently shown [T^ [T5j that 
the value of the exponent describing the probability distribution for the loop lengths at the 
critical point is in perfect agreement with the theoretical prediction, Cp ^ 2.15. An off-lattice 
pure PS model with Cp = 2.15 was also studied numerically [12], finding the same scaling 
behavior than in 3d SAW-DNA, apart from the strong finite size corrections which appear to 
be more important in the on-lattice situation. 

Even though of first order, the transition is characterized by a diverging correlation length, 
which can be identified from the behavior of P{1, T) [TOl [T2| [T5] : 



phase, one finds [21 [SI [3 [121 [IB] : 




for 1 < Cp < 2 
for c„ > 2 



P(/,T) oc - 



(9) 
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with 



e(T) oc (T - T,)-"^ for T T' 



(10) 



where Up = 1 for Cp > 2 and z/p = l/(cp — 1) for second order (or smoother) transitions. 
It can also be predicted that the free energy density /(T) takes the value zero in the high 
temperature phase and that it behaves proportionally to 1/^(T) for T — >• T~, leading again 
to the behavior of the energy density and of the order parameter for different Cp values given 
in ([H]). Therefore, the hyperscaling relation ap = 2 — Up is clearly fulfilled, both for Cp < 2 
and for the first order [op = 1) case. 

2.3 Effect of disorder: previous results 

Disorder is introduced to account for sequence-heterogeneity, with parameter values depending 
on the chemical nature of base pairs (AT or GC) at a given position along a sequence. 
There are a few studies on the effect of disorder on general properties of DNA denaturation 
models in which self-avoidance is neglected |^ HU HSl HHl H?]. Previous numerical works on 
disordered models d la PS was mainly for comparison of the predictions with experimental 
data and genetic signals [221 ESI ESI EH EH] and for the study of the effect of base pair 
mismatches [21], where one usually takes also into account the stacking contributions, with 
the coupling energies depending on the chemical nature of base pairs at positions i and i + 
For comparisons with experimental melting curves, it is moreover necessary to take into 
account the possibility for complete dissociation of the two strands in the molecule [2], HHj • 
We also notice that biological sequences exhibit long-range correlations and strong variability 
in GC compositions, both according to genomes and within chromosomes. Letting aside for 
the present such sophistication, we sum up some recent results for simple disordered models 
d la PS with self-avoidance, such as those considered in off-lattice [29l [30] and on-lattice [31] 
studies, which allow nevertheless to capture essential features of the effect of disorder. 

In these various works, disorder enters only through the position dependent contribution 
of a closed base pair to the total energy. In detail, the {ej} are quenched random variables 
distributed following a binomial probability, corresponding to GC composition equal to 1/2: 

P{e) = \[S{e-e^r) + S{e-eGc)], (11) 

with Cat < ^gc- One is interested in the thermodynamic properties of the quenched free 
energy density: 



f{T) = hm /,(T, A^) = hm /,(T, A^) = - lim -^log^iv,., (12) 

where, as usual, (■) denotes the average over disorder and fe(T, N) is a self-averaging quantity 
in these models [32] . 

Generally speaking it is known, from the theoretical point of view, that disorder can 
modify very significantly the fixed point of a system, and therefore its critical exponents. In 
what follows, the notations with subscripts p and r refer to the (possibly different) pure and 
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random system fixed points, respectively. A series of results [321 EHl EZl, and notably the 
well known Harris criterion [3S], demonstrate that disorder is relevant as soon as the specific 
heat exponent fulfills the condition > 0. Correspondingly, in the presence of disorder, 
the transition becomes smoother and it is in particular expected that < 0, i.e., from 
the hyperscaling relation, a correlation length exponent Ur > 2/d [l9l [50], [51]. It is however 
important to stress that these results are obtained essentially for magnetic systems and it is 
not clear to which extent they are relevant to the case here considered, concerning interacting 
polymers undergoing a first order transition characterized by a diverging correlation length. 

An analysis in terms of pseudo-critical temperatures [52] was applied recently to disordered 
PS models with different Cp values [291 (201 [38l [39l [53l [51] . In these studies, an appropriate 
sample-dependent Tc(e, A^) was defined and measured, looking in particular at the associated 
probability distribution. The results point towards irrelevance of disorder for Cp = 3/2, 
corresponding to the marginal case ap = 0. On the contrary, relevance of disorder was found 
for Cp > 3/2. Importantly, peculiar behavior was observed for the value Cp = 2.15, whereas 
the situation appears to be clear for Cp = 1.75. Indeed, for Cp = 1.75, compatible estimates for 
Ur ~ 2.7(> 2/d = 2) were obtained from the scaling of the average pseudo-critical temperature 
Tc{N) and from that of (the square root of) its fluctuations 6Tc{N). In the case Cp = 2.15, 
instead, a scaling of the average critical temperature Tc{N) ~ 1/A^ was reported, suggesting 
a still first order transition with exponent Ur^i = 1. However it was also found a scaling for 
the fluctuations following STc{N) ~ 1/A^^/^, which was associated to a different exponent 
'^r,2 = 2 (= 2/d). It was then suggested [29], [301 [381 [39] that these observations are compatible 
with a system still exhibiting a first order transition but in which scaling laws are no more 
fulfilled, characterized by two different correlation lengths: a typical one, ~ l/l?" — T^, 
describing the behavior of a typical sample in the thermodynamic limit, with Ur^i = 1, and 
an average one, .^2 ~ ^/{T — Tc)"^, describing the behavior of average quantities, dominated 
by rare fluctuations, with i'r^2 = 2/d = 2. The resulting two-sided scenario is therefore that 
disorder is irrelevant to the typical sample and, in the same time, the obtainment of the ^'^,2 
value is in agreement with the theoretical expectations [IHl [SHI [SI] • 

By contrast, usual finite size scaling analysis of Monte Carlo like simulations results on 
a 3d disordered SAW-DNA model (DSAW-DNA) [31], suggested a transition governed by 
an (average) correlation length exponent Uj. ~ 1.2. However, in this study, average energy 
curves were observed to cross at the same point within the errors in the estimations, and 
accordingly the possibility Ur = 1 = Up could not be completely ruled out. The findings were 
further confirmed by the analysis of the behavior of Pe{l, T) at the critical point, which led to 
the compatible value — 1.9 ^ 1 + l/h'j. when considering the largest sizes and taking into 
account the presence of a finite correlation length. But again, particularly with the smallest 
chain lengths, estimations > 2, still compatible with a first order transition, were obtained. 
It is noticeable that the affordable sizes in Monte Carlo like studies are significantly smaller 
(factors of order ~ 2000) than the sequence lengths accessible to off-lattice recursive canonical 
partition function calculations for PS models. 

Finally, in recent theoretical works based on a probabilistic approach [32l [33l [34], it was 
shown for a general class of interacting polymer models that the transition becomes at least 
of second order in the presence of disorder. The frame of this approach covers the PS models 
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with Cp > 3/2, including the case Cp > 2 with corresponding first order transition in the 
pure system. Accordingly, these conclusions are expected to also cover the 3d DSAW-DNA 
case. Following such studies, it is expected that z/^ > 2 both for average and typical quantities, 
though the possibility of different correlation lengths is not ruled out [S^ . On the other hand, 
according to other theoretical results in which self avoidance is neglected [l6l HT], disordered 
models a la PS could undergo a definitely smoother transition, corresponding to an essential 
singularity in the free energy. 

2.4 A possible scenario for the finite size behavior 

Before presenting our numerical findings, it is worth discussing qualitative features expected, 
at fixed chain lengths N, for the behavior of disordered PS models. As previously recalled, 
disorder should be relevant to these models as soon as Cp > 3/2. Moreover, on general grounds, 
one can argue that the behavior of a system near the transition point is governed by given 
critical exponents which do not depend on model details. Correspondingly, one would expect 
that both the form of -P(ej) and the precise choices for the parameters (here R = ^gc/^at, 
and GC composition) should not correspond to different thermodynamic limit singularities. 
On the other hand, such choices could have strong influence on finite size effects. 

The disordered PS model in [23 ED] and the 3d DSAW-DNA in [21] involve the same 
Cp = 2.15, either as a direct input for the recursive calculations or as consequence of the im- 
plementation of self-avoidance in the simulated model. Nevertheless, there is a first noticeable 
difference between the two systems studiecil], as the off-lattice calculations were inspired from 
a wetting transition model [51], in which it is forbidden for two consecutive elements to be 
in the closed state simultaneously {i.e., in our notation, if Sj = 1 then Sj+i = 0). Also the 
connectivity constant fi is not the same, since it was fixed to the value /i = 2 in [2^ [50] , 
whereas it is an output of the model in the on-lattice simulations, and one finds ^ 4.7 for 
SAWs on a 3d cubic lattice. In addition, the two studies involved significantly different R 
values. In [29l [30] the choice R ^ 1.098 was adopted, for obtaining a critical temperature 
ratio Tc^GC lTc,AT close to the experimental value. On the other hand, in [31], the value R = 2 
was studied in detail and the values R = 4 and R = oo (corresponding to the choice e^c = 1 
and Cat = 0) were also considered. In the latter study, preliminary results suggested that the 
(average) correlation length exponent Uj. could increase with R, ranging from Ur ^ 1.18 for 
R = 2 to Ur — 1-33 for R = oo. 

For proper understanding of these findings, and for a qualitative analysis of the expected 
finite size behavior, it can be important to consider in some detail the potential key role 
of rare regions in the generated sequences. Indeed, the possible presence of such regions, 
of large enough size, can explain the presence of strong corrections to scaling, which could 
therefore depend both on model details and on the precise choice for the parameters. It can be 
noted that temperature and disorder appear only in the tTj = exp[sj(/3ej — log//)] terms in the 
partition function, which are clearly invariant under the transformation (ej aei, T — > aT). 
Moreover, in the pure system, the transition occurs around the temperature T^^p ~ e/log/i, 
at which the energetic contribution for the two bound chains is of the same order than the 

^We thank Thomas Garel for pointing this difference to our attention 
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entropic loss. In the presence of disorder, for a given sequence, it is expected to observe the 
multi-step behavior in 6^{T,N) displayed by experimental DNA denaturation curves. This 
results from the presence of regions with different local contents in terms of GC to AT ratios, 
associated accordingly with different local melting temperatures. 

In the simplest extreme case, one imagines two regions A and B, of about the same 
length L, completely dominated by AT and GC compositions respectively. In such situation, 
the local transition in region A is driven by e^T energies, with local critical temperature 
Tc,ioc{A) ~ Tc^AT ~ Cat/ log /i, whereas the local transition in region B is associated with 
the higher local critical temperature Tc^ioc{B) ~ Tc^gc ~ ecc/log/^ ~ RTc,ioc{A). In this 
illustrative example, for a given temperature, the contributions in the partition function of 
total vr^B factors corresponding to the configurations with base pairs in the closed state, will 
be significantly different for A and B regions. One obtains, for T = T(.^ioc{B) and Sj = 1 V? G 
A,B: 

< = \{TTr=eML{^,loc{B)eAT-\ogli)]^0{l), 

= l[7ri = exp[L{f3,joc{B)eGc-logfi)]^exp{-L/x). (13) 

ieB 

Since, whereas Pc,iociB)eGc — log/i ~ 0, one has 

Pc,ioc{.B)eAT - log/i log/i—— = — , (14) 

n X 

defining the parameter x. From these expressions it is possible to argue that the larger the 
value of L with respect to x the more the effect of disorder will be felt by the finite size 
system, i.e. the difference between the weights of configurations corresponding to closed A 
and B regions in the partition function will be higher. On the other hand, the probability 
for such an extreme case in a particular sequence of size N is quite small. With the choice 
for P{e) in ffTTl) . large N and L values and 2^ ^ iV ^ L, the probability of L contiguous 
elements of the same type is simply ~ Therefore this probability, though approaching 

1 in the thermodynamic limit for any finite L, becomes rapidly negligible with increasing L 
for fixed chain length A^. Following these considerations, for the finite size system to feel the 
effect of disorder, the chain lengths necessary for observing rare regions with L ^ x could be 
not reachable for large x values. For more quantitative analysis, at least in the extreme case 
considered, let us suppose that at the length scale A^i a region of size Li is observed for the 
parameter value Xi, such that Li/xi ^ 1, with non negligible probability Then, in 

order to make the same observation for X2 > Xi, it will be necessary to consider length scales 
of order N2 ~ Ni exp [{X2/X1 — l)Li log 2], thus involving exponential increases with the ratio 
X2/X1 for the sequence lengths A^. 

For attempting to understand the role of the long-range loop entropic effect, one can 
impose that, at the temperature Tc^ioc{B), the weight of the configuration associated with 
region A in the closed state is significantly smaller than that of the configuration associated 
with an open region corresponding to a single loop (of size L), getting the condition L/x ^ 
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CplogL. Correspondingly, one can argue that, with increasingly larger Cp values, increasingly 
larger rare region lengths will be necessary for observing cooperative melting behavior at 
different temperatures. Moreover, the considered extreme case seems particularly appropriate 
for a qualitative description when Cp > 2. Here, the first order character of the transition in 
the pure system and the corresponding favored formation of small loops suggests that larger 
differences of local AT to GC content ratio are necessary for obtaining different local melting 
temperatures. 

In conclusion, important finite size corrections to scaling are expected qualitatively, which 
could in particular depend strongly on the parameter x introduced above, involving both the 
energy ratio R and the connectivity constant /i. Specifically, the effect of disorder on the 
behavior of the system could become evident only for chain lengths diverging exponentially 
with X. This parameter seems therefore to play the role of an intrinsic length scale for the 
rare regions, corresponding to the logarithm of an intrinsic length scale for the system itself. 

Setting aside other differences, the parameter choices in [29], [30] correspond to x ~ 15, 
whereas in [31] they give x ^ 1.3. It is accordingly possible that results in the two studies can 
be explained following the described picture, on the basis of the underlying finite size effects. 
It is nevertheless to be noted that in [2^ [3U] very large sequences, up to ~ 2000000, were 
considered. The present qualitative picture could anyway explain the observations in [21] , for 
an apparent dependence of Vr on i?, as an increase in R at fixed log yU amounts to a decrease 
in X. It is possible that the chain lengths < 800 affordable in the simulations were not large 
enough and that also the value — 1.33 obtained with R = oo was affected by finite size 
corrections to scaling, being therefore to be interpreted as a lower bound for the (asymptotic) 

3 Numerical study 
3.1 Details of the model 

We consider the disordered PS model, with Cp = 2.15, described by the canonical partition 
function: 

^^.=En-"'*-'°""n(a^. (15) 

where Y\k I'uns over all the loop lengths Ik > 2, associated with a given configuration of the 
{si}. Importantly, here and in the following we only impose the condition ^^(4 — 1) + 
Xli ~ ^tot + ^tot < A^; thus allowing for free ends, with separated strands not involved in 
a loop at one end-extremity of the sequence. Such free-ends are not expected to modify the 
thermodynamic limit behavior [IHl [121 [29] , but the high temperature phase of the model will 
consist accordingly of two strands linked only at i = 1 (instead of a large loop, for a system 
with both extremities required to be in the coupled state). This so-called bound-unbound 
{hu) model corresponds more closely to the one studied usually in on-lattice simulations 
[131 [H [151 [31], and it was also adopted in [291 [30]. We notice that the factor ^'^i^ -^tot-ntot) 
cancels out when looking at Z* = Z/fi'^'^. 
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In the present study, we are moreover implicitly adopting the value a = 1 for the coop- 
erativity factor, where the parameter a gives a measure for the barrier to overcome for the 
initiation of a loop opening. In realistic sequence-specific calculations [22l |25l [26], [271 [28] . 
one uses typically a = O(10~^). It is however not clear what choice for a is appropriate 
when Cp = 2.15 since in experimental/theoretical comparisons an exponent Cp ~ 1.8 is gen- 



erally taken and in a recent study it was suggested that the values for a and Cp should 
vary in parallel in order to reproduce correctly experimental melting curves. We note that 
small a values could increase corrections to scaling, whereas this parameter is not expected 
to influence the thermodynamic properties. 

Disordered PS models can be solved numerically by writing down recursive equations for 
the partition function with a SIMEX scheme [U \5[ [23, [21] , taking into account efficiently the 
long-range entropic loop weights. A basic idea in the recursive scheme is that the forward 
partition function Z/ (p -|- 1), which accounts for all configurations up to position p + 1 along 
the sequence with both base pairs at positions i = 1 and z = p + 1 in the closed state 
(si = Sp+i = 1), can be obtained from Z^{p): 



Z/(p + 1) = exp(/?ep+i - logp) 



p-i 



zRp') 



We can similarly write down the equation for the backward partition function: 



(16) 



Z'^ip - 1) = expiPe 



p-i 



logp) 



p'=P+i 



(17) 



with the last term in the equation above corresponding to the free-end configuration (sj = 
for i > (p — 1)). The canonical partition function for the complete chain of length is given 
by: 



TV 



p=i 



where the forward sum takes into account the free-ends. With these calculations one obtains 
in particular the probability for a base pair at position i (along the sequence of length A^) to 
be in the closed state as: 



Zt^exp(/3ei - logp)' 



(19) 



where the division with exp(/?ej — log p) is to rectify the double counting of the corresponding 
factor (involved both in Z/ and in Z^). 



3.2 Measured observables 

For a given disorder sequence {ej, i = 1, . . . , A^}, at fixed chain length A^ and temperature 
T, we can derive from the Ve{i,T, N) = (sj) f[T9l) quantities of interest such as the density of 



14 



closed AT base pairs Oat,^ (respectively GC, Occt), the total density of closed base pairs 9^ 
and the energy density e^: 



OgcAT,N) 

e,iT,N) 

e,(T,iV) 



SAT / ieAT 
ieGC I i&GC 

eATAT,N) + eGcAT,N) 

- [eATOATAT, N) + eccOGcAT, N)] 



N 



(20) 



(21) 

(22) 
(23) 



We can also consider the specific heat as well as the derivative of the density of opened 
base pairs cg ^, which is relevant to experimental determinations: 



Ce{T,N) 

C9AT,N) 



1 de,{T,N) 

r2 dT 

_ 1 d9,{T,N) 

~T2 dT ' 



(24) 
(25) 



Since R ^ 1, the energy density and the order parameter 6^ can exhibit different behaviors, 
and accordingly such can be also the case for q and c^^e- In the same direction we consider 
also the susceptibility, obtained as: 



Xe(T,iV) = - 



1 



N 



N 



, i=l 



J=l 



dOATAT.N) ^ deccAT.N) ^ dpATAT^N) , d^GC,e(T,iV) 



de 



AT 



dc 



GC 



de. 



GC 



de 



AT 



(26) 



providing interestingly a possibility for checking numerical accuracy in the computations 
from the fulfillment of the equality dOATA'^^ /d^GC = dOccA'^y / d^AT- 

We study moreover the behavior of the (non-normalized) loop length probability distribu- 
tion: 

^"'~'z/(z)Z,^(^ + / + l) 



i=l 



(27) 



with N{1) = 1/{21Yp, independent from the disorder sequence. Therefore we introduce the 
quantity: 

P*{l,T,N) = {2iy^P,{l,T,N). (28) 

noting that in the pure model P*{l,T,N) oc exp(— //,^(T, A^)) (see ([9])) and correspondingly 
P*(/, T, N) const for T ^ T~ . 

As a first step in the analysis of such data, we will check the validity of standard finite 
size scaling for quantities averaged over disorder. With the usual definition of the critical 
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exponents and y = {T — Tc)N^^'^'' , one expects: 

e.(r,iV) ~ N'^-'^-'eiy) (29) 

Ce{Tc,N) ~ iV2/'^'-i£(y), (30) 

^e(T,iV) ~ N-^^/^^eiy) (31) 

x.(r,iv) ~ N^'-^-'^xiy), (32) 

where it is possible to find /J^/i^r 7^ 1 — V'^r ^'^'^ 7^/2^,. 7^ 2/1^,. — 1 = a^lvr- For the average 
loop length probability distribution, we still look for a behavior at the critical point described 
by a power law as in the pure case: 



P,(/,Te,iV) ocl/f% (33) 

and therefore 

P*{l,T,,N) oc r^-'^^ (34) 
Interestingly, based on this relation, it should be particularly simple to seek numerical evidence 

for Cr Cp. 



3.3 Computational details 

We resort to the SIMEX scheme O [231 [21], based explicitly on partition function evaluations, 
instead of recursions for specific conditional probabilities as in |3]. Besides this conceptual 
difference (important for generalizations, notably to higher-order models), for the simple 
helix-coil model in linear molecules, as considered here, the reduction of the computational 
complexity by one order of magnitude in the SIMEX method relies on the numerical rep- 
resentation of the long-range effects in the model as a sum of Ns exponentials, as already 
formulated in the FF method. The other important ingredient in the FF, also implemented 
straightforwardly in the SIMEX, corresponds to a forward-backward scheme as described in 
Section 3.1, classical in dynamic programming and associated with an additional order of 
magnitude reduction in complexities. For the linear case, the complexities for a complete 
probability map calculation reduce overall from A^^ (for a one-way progressive treatment) to 
NsN. 

In order to make the scheme operational in practice, it is necessary to obtain appropriate 
numerical representations for long-range effects as sums of exponentials. The general nu- 
merical problem associated with the analysis of multiexponential functions is notoriously a 
delicate one. It covers two distinct -in principle- situations, concerning either identifications 
or approximations, the relevant case for the present study. In the identification situation, it 
is necessary to recover the correct number of exponentials, and of course the correct associ- 
ated parameters, from curves (usually experimental) supposed to be of multiexponential type 
for theoretical reasons. A general solution to this problem is provided by the Pade-Laplace 
method [SB] , requiring no a priori hypotheses for the identification of components in sums of 
general exponentials (real and/or complex). This formulation encompasses, and generalizes, 
in a unified frame, a series of solutions since Prony's method and the so-called method of mo- 
ments [57]. Even though originally formulated as an identification approach, the numerical 
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application of the Pade-Laplace method to power-law functions (such as for loop-entropies 
here) revealed an identification-like behavior in this approximation problem, in the sense that, 
for given maximal long-range lengths, a fixed number of significant exponential components 
are obtained. For example, for a series of biologically-oriented studies, an approximation of 
with Ns = 14 exponentials was shown to be appropriate (with further refinements of 
the parameters with least-squares procedures) In the present study, in order to be in 

strictly comparable conditions with this respect, we adopt the numerical representation of 
1//2-15 ^j^]^ ^ exponentials in [29] . 

For the numerical computation of the recursive equations for the forward and backward 
partition functions, it is moreover necessary to avoid underflow/overflow problems. For this 
purpose different schemes can be implemented [3 IM] , in order to normalize the numerators 
and denominators the ratios of which are involved in the evaluation of probabilities 0191) . We 
consider here the normalization described in [2l], based on the introduction oi free energy like 
quantities for the handling of the logarithms of Z/ and Z^. The details of the implementations 
are provided in the Appendix, along with the description of boundary conditions. 

We study extensively the case x = R/[\ogfi{R — 1)] = 1.3, using the same energies and 
connectivity constant as in [HT]: R = 2 and log/i ~ log fisAw — 1-54, in three dimensions. 
We consider sequence lengths ranging between = 100 and = 20000. For x = 1.3, such 
A^-values appear to be large enough both for the clarification of the thermodynamic limit 
behavior and for the study of corrections to scaling. On the other hand, the numerical com- 
putations are reasonably fast up to this length, which makes it possible to consider closely 
spaced temperatures and to obtain correspondingly, with negligible numerical errors, quanti- 
ties related to derivatives, such as in particular the values of the maxima of the specific heat. 
In detail, for given chain lengths, we consider A/r = 250 different T-values, equally spaced 
in intervals [Tmin{N),Tmax{N)] around the corresponding Tc{N), evaluated roughly from the 
position of the maximum of the average specific heat in some preliminary results. For the 
different chain lengths, the number of samples A/1(A^) as well as the Tmin{N) and T^axiN) 
temperatures are detailed in [Tab. 1]. 

It was checked in particular that with the various choices above the errors on the maximum 
of specific heat and of susceptibility were both consistently smaller than fluctuations between 
samples. Without any loss of generality we set in all calculations cat = 1, i-e. temperature is 
in e^T unities. The evaluation of the susceptibility is obtained by numerical derivations with 
respect to e^r and e^c (see (|2B]1 ). with Scat = /3 ■ 10~^ and 6eGc = R^^at, which ensures 
the desired numerical accuracy at all temperatures. Finally, in all calculations the errors on 
average quantities are computed from sample-to-sample fluctuations. 

4 Results and discussion 

4.1 Given sample-sequence and different x values 

We consider first the qualitative behavior of the model for a given sample-sequence of length 
A^ = 10000, with different x = i?/[log/i(i? — 1)] values. Results shown in this Section are 
obtained with a particular disorder configuration. It was however checked that the corre- 
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1.0 


1.16 


1000 
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1.0 


1.15 


2500 


1000 


1.02 


1.14 


5000 


1000 


1.02 


1.14 


7500 


1000 


1.04 


1.12 


10000 


600 


1.04 


1.14 


15000 


500 


1.04 


1.12 


20000 


500 


1.04 


1.12 



Table 1: Tabulation of the chain lengths A^, the number of samples Ms{N) and the range 
of temperatures \Tmin{N) .T^ax^N)] in the numerical computations. For each disordered se- 
quence Mt = 250 equally spaced temperatures, in the corresponding intervals, are considered. 



sponding qualitative observations are also valid with various arbitrarily chosen sequences. In 
order to cover different significant situations, the following values for the parameter x were 
considered: x = 2, 1.3, 1.4, 0.7, respectively. Notably, the choice x = 2 is for comparisons 
with the results in [291 EO]. In detail, we used the same R = 1.098, and we set in addition 
log/i ~ 5.55, for obtaining close critical temperatures in the pure case. The choice x = 1.3 is 
for compatibility with the conditions in [31], and accordingly we set in this case R = 2 and 
log/i ~ 1.54. On the other hand, the choice of the close value x ~ 1.4 is following parameters 
setting usual in comparisons with experimental results [221 ESI [211 ET] . In this latter case, the 
value for x is not related to a large R value, but rather to a large average log/i ~ 12.35 (in 
kB unities). It can be noticed that typically used coupling energies lead to i? ~ 1.062, as ob- 
tained by averaging over the different stacking energies for neighbor base pairs j+i. Finally, 
for clarification of potential differences resulting from choices of large R or alternatively large 
fj, values, X ~ 0.7 was retained as corresponding to the two choices for {R, log/i) couples: 
R ~ 1.062, log/i ^ 24.5 (case a); and R ~ 18.38, log/i ~ 1.54 (case b). 

We plot in [Fig. [1] the susceptibilities Xe(T, x, N) for the sample-sequence for the different 
X values. We observe that the results depend strongly on x. However, the shapes of the 
curves obtained with x = 1.3 and x = 1.4 appear to be strikingly similar. Moreover, also the 
two curves corresponding to the two different cases associated with x = 0.7 (case a and case 
b) are qualitatively similar. 

The results here are in agreement with the overall picture given in Section 2.4, with 
indications for an x-dependent finite size behavior. The extreme case considered there, with 
pure AT and GC regions, is clearly a very rough approximation of a typical sequence. It is 
nevertheless clear that the parameter x appears to capture some essential ingredients of the 
model. 
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Figure 1: Susceptibility curves for a given sample-sequence of length = 10000 and different 
X values (see text). 
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Figure 2: Scaled susceptibility for a given sample-sequence of length = 10000 and different 
X values. XeRlW'{R + 1)] data are plotted as function of [T — Tc(e, x, A^)] log /i/ (i? — 1), with 
Tc{e,x,N) the temperature associated with the absolute maximum of susceptibility. In the 
case a with the value x = 0.7, specific choice was needed for defining the pseudo- Tc, and 
we looked in this case to correspondence of positions of the peaks with the other curves, by 
taking it as the temperature of the second maximum. 



For more quantitative analysis, we note that for N ^ oo one expects a transition tem- 
perature close to ~ (Tc^AT + Tc^Gc)/'^ ~ (i? + l)/(21og/i). For a given finite sequence, a 
pseudo-critical temperature Tc(e, x, N) must be adopted, as the critical temperature is not well 
defined. Here we take as Tc{e,x,N) the temperature associated with the highest maximum 
value for susceptibility [52]. In particular, with such a choice, we find that the behavior of 
XeiT, X, N) displays some scaling when plotted as function of [T — Tc(e, x, N)]Tc{e, x, N)/{R — 
1), for different R and x values. We present in [Fig. [2] correspondingly scaled Xe data (also 
multiplied by the factor R/[x'^{R + 1)], for obtaining close behaviors in the high temperature 
phase) . 

This figure further makes evident the dependence on x for finite size systems. The results 
suggest that, at fixed length scale N, for large x, and in particular here already for the value 
X = 2, the system exhibits typically only one very sharp peak. This observation should be 
related to the fact that the probability of large enough rare regions is negligible (though one 
could still encounter such a case when considering a large number of sequences), and the 
system behaves essentially as a pure model with e = (e^r + eGc)/2 = {R+l)/2. For smaller x 
values, we observe on the contrary an increasing number of peaks, with decreased sharpness. 
This finding is coherent with the qualitative picture following which, with increasingly smaller 
X values, rare regions with increasingly smaller lengths L are sufficient in order to observe 
multistep behaviors, since the relevant quantity should be the ratio L/x. Nevertheless, for 
the smallest considered value x = 0.7, obtained with two different parameters choices, we 
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observe the same number (four) of peaks, but the position with respect to the other peaks for 
the absolute maximum of the curve is shifted for x = 0.7 a as compared to the corresponding 
one for x = 0.7 b and the larger x values. This confirms that the introduced x describes the 
finite size behavior only approximately. 

The changes in the behavior of a typical sample, as described above, are also expected 
at larger sequence sizes for a given x-value, since this parameter appears to behave as (the 
logarithm of) an intrinsic length scale of the system. Some numerical evidence in this direction 
was already given in [31] for the on-lattice DSAW-DNA, where the qualitative analysis of the 
specific heat suggested the appearance of multi-peaked curves only for large enough A^-values, 
and an increasing number of peaks in typical sequences as function of A^. Here we obtain 
the same qualitative results for the value x = 1.3 studied in detail. Nevertheless, detailed 
quantitative bases for such conclusions are left for future work, with an extensive study of the 
finite size effects for different x values. 

It is to be emphasized that in the suggested picture it is the behavior of the typical finite 
size sample which is expected to change when varying x, and therefore one would not expect 
different typical and average correlation lengths in the thermodynamic limit, though the sizes 
necessary for confirming this hypothesis could be out of reach for large x values. An analysis in 
terms of pseudo-critical temperatures is clearly necessary to distinguish between this situation 
and the alternative one proposed in [291 EOl EH [39] , which is left to a forthcoming work 00] . 

4.2 Behavior of average quantities 

In what follows, we investigate the behavior of average quantities for x = 1.3 (obtained by 
setting R = 2 and log /i = 1.54). The average energy density e^iT, N) and the average closed 
base pair density 6^{T,N), for different chain lengths, are represented in [Fig. [3]. It is clear 
that the system undergoes a phase transition in the thermodynamic limit, with the energy 
density and the order parameter going from zero at high temperature to finite values below 
Tc. We can moreover observe from [Fig. [3] very similar behaviors for the two quantities 
(apart from the sign difference) and we expect to find, as a consequence, Pr = Ur — 1 and 
7r = ar = 2 — Ur- We have also checked that the average densities of closed AT 9AT,e(T,N) 
and GC 6Gc,e{T, N) base pairs exhibit qualitatively similar behaviors. 

Nevertheless, the data do not agree with the corresponding expected scaling laws (see ( l29i) 
and (13T!) ). making clear the presence of strong corrections to the asymptotic behavior. Even 
though it is therefore difficult to evaluate the critical exponents, the fact that the curves do 
not cross at the same point (as particularly evident for the largest sizes) suggests a transition 
with (an average) Ur > 1 {= Vp). More in detail, the energy density and the order parameter 
appear to converge both towards functions which are continuously vanishing at the critical 
point and possibly also differentiable, which would imply a transition at least of second order. 

Still better evidence for a smooth transition comes from the average specific heat Ce(T, A^) 
data, plotted in [Fig. 0]. One can notice from this figure that the maximum appears to diverge 
with the chain length for the smallest sizes, but saturates for larger A^-values, as expected for 
a critical point characterized by a,. < 0, z.e. > 2/(i (= 2). Interestingly, the qualitative 
behavior for chain lengths smaller than ~ 1000 appears to be similar to the one found in 
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Figure 3: Averaged energy density e^{T,N) and averaged total density of closed base pairs 
9^{T,N), for various chain lengths, plotted as functions of temperature. 



[3T] . This observation suggests that the on-lattice DSAW-DNA and the off-lattice disordered 
PS model considered could exhibit the same kind of finite size effects, also supporting the 
hypothesis that the value z/^ ~ 1.2 obtained from the Monte Carlo like numerical simulations 
represents a lower bound for the average correlation length critical exponent. 




N 



Figure 4: Average specific heats Ce(T, A^), for the chain lengths considered, plotted as functions 
of temperature. 



We find a very similar behavior for the susceptibility Xe{T, N), represented in [Fig. [5], as 
well as for the derivative with respect to the temperature of the total density of closed base 
pairs C0^^{T,N) (not shown). These findings further suggest that these quantities, as well as 
the energy and the order parameter, are described also in the disordered case by the same 
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Figure 5: Averaged susceptibilities Xe(T,N), for the chain lengths considered, plotted as 
functions of temperature. 
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Figure 6: Average P*{l,Tc{N), N) = {21YpP,{1,Tc{N), N), for the chain lengths considered, 
plotted as functions of the loop lengths / at temperatures Tc{N), corresponding to maxima 
of average specific heat. Normalization factors are arbitrarily chosen in order to make values 
comparable at small /. 
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critical exponents. Again, it is difficult to evaluate the exponents by applying standard finite 
size scaling analysis to the data, because of the obvious presence of strong corrections to the 
expected laws (here fl30|l and fl32l) ). 

Finally, [Fig. [6] displays data for P*{l,Tc{N), N) (see f l28l) ) at the size-dependent critical 
temperatures Tc{N), identified here with the temperatures associated with the maximum 
of the average specific heat. We note also in this case a strongly A^-dependent behavior. In 
particular, the considered quantity is nearly constant in the range 1 ^ / ^ for the smallest 
sizes, whereas for the largest ones, expected to be the most meaningful, it is increasing with / 
rather linearly (on logarithmic scale) on a wide ranged interval, which should mean that the 
expected power law representation (133|) is valid, but with Cr <^ Cp. 

We notice that the observed strong A^-dependence of averaged quantities and the fact 
that data do not obey usual scaling laws on the whole A^-range studied is in agreement with 
the qualitative picture given in Section 2.3, clearly suggesting that, at fixed x, the effect of 
disorder becomes evident, and the system reaches the asymptotic behavior, only for large 
enough size values. From this point of view, in particular the saturation of the specific heat 
and susceptibility maxima can be related to the appearance of a larger number of less sharp 
peaks (apparently in the typical sequences) for increasing A^, in analogy with the behavior 
discussed in the previous Section when decreasing x at fixed chain length. 

4.3 Critical exponents and corrections to scaling 

We consider in terms of quantitative analysis data for the maximum of the average specific 
heat [Fig. [7^.] and the maximum of the average susceptibility [Fig. [7b] , as functions of chain 
lengths A^. Using the law c^{T, N) oc N"^/'^^^ which is a particular case of (l30l) . we obtain 
from the analysis of these data > and correspondingly z/^ < 2 when considering only 
the smallest chain lengths. In detail, the exponent would be still compatible with the value 
Ur = Up = 1 characterizing a ffist order transition, for A^ < 1000. For both c^(T,N) and 
Xe{T, N) the asymptotic saturation becomes obvious only for sizes larger than A^ ~ 5000. 
Following [52], we consider a fit of the data to the form gi — g2N'^, with gi,g2 > and where 
the exponent is Cc = ar/i^r for the specific heat and = 'jr/t^r for the susceptibility. Using 
the whole data sets in the fits, negative exponents, compatible with zero within the errors, 
are obtained in both cases. On the other hand, with corrections to scaling, strictly negative 
exponents are obtained. Letting aside the two smallest sizes (A^ = 100 and A^ = 200), the 
best fit in the case of c^{T,N) corresponds to Cc = a^lvr = —0.3 ± 0.1, and we get a 
compatible value for = 7^/ I'r from xiT, N) . This result con&ms that = 7r and, 
from the hyperscaling relation, it implies z/^ ~ 3. Intriguingly, the correlation length exponent 
value is close to the one obtained for the disordered PS model with Cp = 1.75 considered in 
[5U] . We notice that limiting the analysis to A^ > 500 we obtain a still larger Ur, but the 
statistics in our study do not allow more accurate evaluations. Nevertheless, the important 
point is that, when looking at average quantities, it clearly appears that the transition is at 
least of second order and, at the same time, the crossover between a pure system like behavior 
for small sizes and the (apparent) asymptotic one is quantitatively con&med. 

For further validation of this result, and for a better understanding of finite size corrections 
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Figure 7: Data on the maximum of the average specific heat c(T, A^) and on the maximum 
of the average susceptibihty x(T, N) , plotted as functions of chain lengths, together with 
the best fits to the law gi — g2N^ with gi^g2 > 0. 
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Figure 8: Evaluations of the finite size critical exponent Cr{N), plotted as function of 
The results are from the fits of the probability distribution of the loop length to the expected 
law ([3lD P*{l,Tc{N),N) oc /^f"^'' (with Cp = 2.15 and Tc(iV) taken as the temperature for 
which the average specific heat is maximum). For each chain length, data are fitted in the 
/-range (with Z > 2) in which P*(l,Tc{N), N) is increasing with /. Here the errors are only 
indicative, as they depend strongly on the number of points in the range of the fit. 
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Figure 9: For the different chain lengths considered, data for log P* {I, Tc{N), N), at temper- 
atures Tc{N) for which average specific heats reach their maxima. The expected asymptotic 
behavior (2.15 — Cr) log / + b with Cr = 1.4 (i.e. ~ 1 + l/t'r with Ur obtained from the fit of 
the maximum of the specific heat) is also plotted. 



to scaling, we consider the fits of P*{l,Tc{N), N) to the expected behavior oc I'^p"'^^ ( |34l) . We 
get correspondingly the sz2e-(iej»en(ieni estimations of the critical exponent Cr{N), which are 
presented in [Fig. [8]. Here the (finite size) critical temperatures Tc{N) are taken as the 
temperatures for which the average specific heat reaches its maximum and we disregard the 
possible presence of a finite correlation length. We fit data by considering only the /-range 
(with / > 2) in which P*{1, Tc{N), N) is an increasing function of / (see [Fig. [6]). The obtained 
values of Cr{N), and therefore of l/urlN) = min{l, Cr{N) — 1}, for different chain lengths are 
definitely not compatible within the (though indicative) errors and one observes a clear trend 
towards decreasing Cr{N) values for larger chain lengths. It is in particular interesting to 
notice that, for = 200, we still have Cr{N) ^ 2.1 and correspondingly Vt{N) = 1, whereas 
for ~ 1000, we obtain Cr{N) ~ 1.8 -h 1.9, again in perfect agreement with the results of [31] 
concerning the 3d DSAW-DNA on-lattice model. On the other hand, with the study of larger 
chain lengths it becomes clear that the transition is at least of second order with > 2. In 
fact, for the largest size considered A^ = 20000, the exponent obtained is Cr{N) ~ 1.5. This 
implies that the correct evaluation of the (asymptotic) critical exponents, apart from finite 
size corrections, as Cr = limTv^oo Cr(A^) leads to Cr < 1.5. This value can be compatible with 
Cr = 1 + l/i'r ^ 1-35 obtained from the maximum of the specific heat. 

It is nevertheless important to stress that the above results concern average quantities. 
The thermodynamic limit behavior of the typical sample could be therefore blurred by the 
fluctuations of the sequence- dependent pseudo-critical temperature, if they are governed by 
an exponent h'r,2 > I'r,! as suggested in [291 [30l [381 [39] . K is possible that we are only looking 
at the average correlation length C,2{T,N), and in particular the value = 1.5 would be in 
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agreement with the result ^^^2 = 2 in [3D]. 

Accordingly, we consider also data on \og P* {I, Tc{N), N), evaluated by averaging over 
disorder after taking the logarithm. To be precise, following [30], the quantity expected to be 
described by the typical correlation length is log{[Z/ (i, N)Z^{i + /, N)]/Z*j^}. Here we 

are instead considering a kind of mixed average, but the behavior of log P*{1, Tc{N), N) should 
anyway display differences with that of logP*(Z, T, A^) in the presence of two distinguishable 
correlation lengths. On the contrary, the comparison between [Fig. [9] and [Fig. [6] shows 
that, at least for T = Tc{N), logP^ and logP^ behave similarly. In particular, logP^* is also 
an increasing function of / for the largest considered sizes, and on quantitative grounds the 
evaluated Cr{N) values are essentially compatible within errors. In order to emphasize this 
point, the expected asymptotic behavior logP* ~ (2.15 — c^) \ogl + b with c,. = l + l/z/,. ~ 1.35 
(obtained from the average specific heat) is also plotted in [Fig. [9]. More quantitative analysis 
of these results are left for a forthcoming work [40]. 

5 Conclusions 

We studied numerically a disordered PS model for DNA denaturation with Cp = 2.15, which 
displays a first order transition in the homogeneous case, by solving recursively the equations 
for the canonical partition function with the SIMEX scheme. The model is made as similar 
as possible to the 3d DSAW-DNA previously studied by Monte Carlo like simulations [31] 
and it is expected that the results of the study could also be relevant to different disordered 
PS models with Cp > 2. 

We introduced the parameter x = P/[log/i(P — 1)], where R = eac/^AT is the ratio of 
the Guanine-Cytosine to the Adenine-Thymine coupling energies and fi is the connectivity 
constant of the corresponding on-lattice model. We showed that this parameter, at fixed Cp 
value and GC composition (taken to be 1/2), appears to play the role of (the logarithm of) 
an intrinsic length scale, and to describe, in a first approximation, the finite size behavior. In 
particular, for a given A^-value, the manifestation of the effect of disorder appears to be the 
most evident for the smallest x values, in agreement with a qualitative explanation based on 
the possible occurrence of large enough rare regions. It is interesting to notice that, within 
this picture, the system size necessary for observing the asymptotic behavior and making 
evident the effect of disorder diverges exponentially with x. 

We studied in detail the value x = 1.3, obtained with R = 2 and log/x = 1.54 (as in 
[ST]), for sequence sizes up to A^ = 20000 (larger than the sizes accessible to Monte Carlo like 
simulations by a factor 20). We found that the model exhibits strong corrections to scaling, 
displaying a crossing between a still nearly pure system like behavior for small chain lengths 
A^ < 1000 and the observed (apparently asymptotic) large one for A^ > 5000. In particular, 
the maximum of the average specific heat, which behaves as the susceptibility, increases with 
A^ for small chain lengths. Considering the whole size range, it appears instead to be clearly 
saturating. This result shows that, at least from the point of view of average quantities, the 
thermodynamic limit is described by a random fixed point with ar < and correspondingly 
Ur > 2/(1 = 2. By fitting data with a scaling law of the form gi — g2N°'''^'^^, and by taking 
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into account corrections to scaling, we find in particular a^jv^ = —0.3 ± 0.1, which by using 
the hyperscaling relation gives ~ 3. 

The average loop length probability distribution at the critical temperature appears still 
described by a power law at least on an interval 1 <^ / -C of the range. Upon fitting 
data according to P^{l,Tc{N), N) oc l/l'^'' one finds A^-dependent values for the exponent 
which is compatible with Cp for the smallest sizes whereas when looking at the whole N 
range it appear to converge towards an asymptotic limit < 1.5 (possibly compatible with 
Cr = 1 + 1 /vr ~ 1.35). Moreover, logP(/, Tc{N), N) exhibits also a similar behavior, suggesting 
that there is no difference between typical and average correlation lengths. 

Our best-fit estimation, Ur ~ 3, is close to the estimation in [30] for the case Cp = 1.75. This 
observation would support the hypothesis that disorder is relevant as soon as Cp > 3/2 and 
that the various disordered PS models considered could be described by the same random 
fixed point corresponding to a transition which is at least of second order (and probably 
smoother), in agreement with recent analytical findings [321 [33], [M]. Our statistics do not 
allow nevertheless to completely rule out the possibility that i>r = 2, particularly from data, 
and in any event an analysis in terms of pseudo-critical temperatures [521 [291 1201 (Ml ISHl [39] 
is in order for clarifying the situation. We leave this development to a forthcoming work [10] . 

It is also interesting to notice that there are very recent theoretical studies [SSI ISHl 120] 
on the loop dynamics in PS models, which in particular relates the equilibrium loop length 
distribution probability to the correlation function, therefore suggesting a new intriguing 
method for measuring experimentally the c value. From this point of view, the expected 
behavior of P{1) in presence of disorder and the possibility of observing differences between 
the average and the typical sequence cases seems to us important questions to be clarified. 

In conclusion, our results provide numerical evidence for strong finite size corrections to 
the asymptotic behavior of the disordered PS model considered. The data show moreover 
that disorder is relevant, at least from the analysis of average quantities. The findings here 
appear also to confirm that the evaluation Ur ~ 1.2 in previous numerical study concerning 
on-lattice 3d DSAW-DNA model [21] is to be considered as lower bound for the correct 
(average) correlation length exponent. The observed behavior is in agreement with a proposed 
qualitative picture for finite size effects, which could also explain the difference with the results 
of previous studies on a different disordered PS model with the same Cp [291 EHj • A preliminary 
presentation for part of the findings and hypotheses here can be found in [61j . 
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Appendix 

We use in the numerical computations the SIMEX implementation of the FF scheme 
which relies on the numerical approximation of the powers l/l'^^ by sums of exponentials: 



(20 



Ns 

Jy- - ^(^k exp{-2lbk). (35) 



k=l 



In the present study we consider Cp = 2.15 and the values for the coefficients at and bk, 
with Ns = 15, provided in [29]. The computation of the recursive equations for the forward 
and backward partition functions were implemented with the introduction of free energy like 
quantities, in order to handle logarithms of Z/ and (following [24j). 
In detail, the equation f|T6|) for the forward partition function becomes: 



Zi{p + 1) = exp(/3e,+i - log/i) <^ Z/(p) + a, ^ Z/(p') exp[-2(p - p' + 1)6^] \ . (36) 

( k=i p'=i J 

By defining 

p 

Qkip) = J2 ^i^P') exp(2p'6fc) = exp[2p6fc + pfc(p)], (37) 
p'=i 

one obtains 

Z/(p) = exp[pfc(p)] - exp[pfc(p - 1) - 26fe] (38) 
and a recursion relation for Pfc(p): 

Pfc(p+1) = ^ik{p)+\og{Af + Bf + Cf) (39) 
Af = exp(-26fc) 

Bf = exp(/3ep+i - logp) {1 - exp[-26fc + pfc(p - 1) - /ifc(p)]} 

fNpF 

Cf = exp(/3ep+i - log/i) <^ ^ aj exp[-4bj + fij{p - 1) - Pfc(p)] 
Analogously, one writes the equation for the backward partition function: 

{Npp N 
^e(P) + E^fc E ^e(p')exp[-2(p'-p+l)6fc] + l 
k=l p'=p+l 

(40) 

and defines 

Rkip) = Zeip) exp(-2p'6fc) = exp[-2p6fc + Uk{p)], (41) 
p'=p 
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obtaining 

Vk{p-l) = Uk{p) + \og{A^ + B^ + C, + D^) (42) 

A = exp(-26fc) 

Bb = exp(/3ep_i - log/i) {1 - exp[-26fc + ^'fc(p + 1) - «^fc(p)]} 

Cb = exp(/5ep_i - logp) <^ ^ aj exp[-Abj + Uj{p + 1) - i^fe(p)] 

Db = exp(/3ep__i - logp)exp[-z/fc(p)]. 

We used the boundary conditions (with the imphcit assumption Z/ (0) = Z^{N + 1) = 0): 

Z/(l) = exp(/?ei-logp) 
Z/(2) = exp(/?ei - log/x) exp(/3e2 - logp) 
Z^^(A^-l) = exp(/3eAr_i - logp)[exp(/5eAr - log/i) + 1] 

Z'^iN) = exp(/3e;v-log/i), (43) 

and correspondingly: 

/ifc(l) = /3ei-log/i 

/ifc(2) = /5ei - log/i + log[exp(-26fe) + exp(/?e2 - logp)] 
i^k{N - 1) = peN - log/i + log{exp(-26fc) + exp(/?eAr_i - log/i)[l + exp(-/5eAr + log/i)]} 
j/fc(iV) = /5e;v- log/i. (44) 
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